# Zero-shot image classification

OPENCLIP SigLIP Tiny 14 Distill SigLIP 400m Cc9m
MIT
A lightweight vision-language model based on the SigLIP architecture, extracting knowledge from the larger SigLIP-400m model through distillation techniques, suitable for zero-shot image classification tasks.
Image Classification
O
PumeTu
30
0
Clip Backdoor Vit B16 Cc3m Blto Cifar
MIT
This is a pre-trained model for researching backdoor sample detection in contrastive language-image pre-training, containing a specific backdoor trigger BLTO.
Text-to-Image English
C
hanxunh
9
0
Vit Gopt 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
1,953
1
Vit SO400M 16 SigLIP2 512
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, suitable for zero-shot image classification tasks
Text-to-Image
V
timm
1,191
4
Vit SO400M 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks.
Text-to-Image
V
timm
106.30k
2
Vit SO400M 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
998
0
Vit SO400M 14 SigLIP2 378
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
1,596
1
Vit L 16 SigLIP2 512
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
147
2
Vit L 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
888
0
Vit B 16 SigLIP2 512
Apache-2.0
A SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
1,442
1
Vit B 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks
Text-to-Image
V
timm
1,497
0
Vit B 32 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
691
0
Vit B 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
10.32k
4
Siglip2 So400m Patch14 384
Apache-2.0
SigLIP 2 is a vision-language model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Image-to-Text Transformers
S
google
622.54k
20
Siglip2 So400m Patch14 224
Apache-2.0
SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.
Image-to-Text Transformers
S
google
23.11k
0
Siglip2 Large Patch16 512
Apache-2.0
SigLIP 2 is an improved model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
4,416
8
CLIP ViT H 14 Laion2b S32b B79k
MIT
This is a vision-language model based on the OpenCLIP framework, trained on the LAION-2B English subset, excelling in zero-shot image classification and cross-modal retrieval tasks.
Text-to-Image
C
ModelsLab
132
0
CLIP ViT B 32 Laion2b S34b B79k
MIT
A vision-language model trained on the LAION-2B English dataset based on the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval
Text-to-Image
C
recallapp
17
0
Eva Giant Patch14 Clip 224.laion400m
MIT
The EVA CLIP model is a vision-language model based on OpenCLIP and the timm framework, supporting zero-shot image classification tasks.
Text-to-Image
E
timm
124
0
Eva02 Large Patch14 Clip 224.merged2b
MIT
The EVA CLIP model is a vision-language model based on OpenCLIP and timm model weights, supporting tasks such as zero-shot image classification.
Image Classification
E
timm
165
0
Eva02 Enormous Patch14 Clip 224.laion2b Plus
MIT
EVA-CLIP is a large-scale vision-language model based on the CLIP architecture, supporting tasks such as zero-shot image classification.
Text-to-Image
E
timm
54
0
Eva02 Enormous Patch14 Clip 224.laion2b
MIT
EVA-CLIP is a vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Text-to-Image
E
timm
38
0
Eva02 Base Patch16 Clip 224.merged2b
MIT
The EVA CLIP model is a vision-language model built on the OpenCLIP and timm frameworks, supporting tasks like zero-shot image classification.
Text-to-Image
E
timm
3,029
0
Vit Huge Patch14 Clip Quickgelu 378.dfn5b
Other
ViT-Huge image encoder based on CLIP architecture, trained on DFN5B dataset, supports quick GELU activation
Image Classification Transformers
V
timm
27
0
Vit Huge Patch14 Clip 378.dfn5b
Other
The visual encoder component of DFN5B-CLIP, based on ViT-Huge architecture, trained with 378x378 resolution images for CLIP model
Image Classification Transformers
V
timm
461
0
Vit Base Patch16 Clip 224.dfn2b
Other
Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple
Image Classification Transformers
V
timm
444
0
Vit Base Patch32 Clip 256.datacompxl
Apache-2.0
Vision Transformer model based on CLIP architecture, specialized in image feature extraction with support for 256x256 resolution input
Image Classification Transformers
V
timm
89
0
Vit Base Patch32 Clip 224.datacompxl
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained using the DataComp XL dataset
Image Classification Transformers
V
timm
13
0
Vit Base Patch16 Clip 224.datacompxl
Apache-2.0
A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, using ViT-B/16 structure and trained on the DataComp XL dataset
Image Classification Transformers
V
timm
36
0
Convnext Xxlarge.clip Laion2b Soup
Apache-2.0
ConvNeXt-XXLarge image encoder based on the CLIP framework, trained by LAION, suitable for multimodal tasks
Image Classification Transformers
C
timm
220
0
Vit Huge Patch14 Clip 224.metaclip Altogether
CLIP model based on ViT-Huge architecture, supporting zero-shot image classification tasks
Image Classification
V
timm
171
1
Longclip SAE ViT L 14
A Long-CLIP model fine-tuned with Sparse Autoencoder (SAE), supporting long-text input and optimized for text-image alignment
Text-to-Image Safetensors
L
zer0int
290
18
LLM2CLIP EVA02 L 14 336
Apache-2.0
LLM2CLIP is an innovative approach that enhances CLIP's visual representation capabilities through large language models (LLMs), significantly improving cross-modal task performance
Text-to-Image PyTorch
L
microsoft
75
60
Vit Gigantic Patch14 Clip 224.metaclip 2pt5b
A dual-framework compatible vision model trained on MetaCLIP-2.5B dataset, supporting both OpenCLIP and timm frameworks
Image Classification
V
timm
444
0
Vit Huge Patch14 Clip 224.metaclip 2pt5b
A dual-purpose vision-language model trained on the MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
3,173
0
Vit Large Patch14 Clip 224.metaclip 2pt5b
A dual-framework compatible vision model trained on MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
2,648
0
Vit Large Patch14 Clip 224.metaclip 400m
Vision Transformer model trained on MetaCLIP-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
294
0
Vit Large Patch14 Clip 224.laion400m E32
MIT
Large Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
1,208
0
Vit Base Patch16 Clip 224.laion400m E31
MIT
Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
1,469
0
Vit Base Patch32 Clip 224.metaclip 2pt5b
A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks
Image Classification
V
timm
5,571
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase